Fast Class-Attribute Interdependence Maximization (CAIM) Discretization Algorithm

نویسندگان

Lukasz A. Kurgan

Krzysztof J. Cios

چکیده

Discretization is a process of converting a continuous attribute into an attribute that contains small number of distinct values. One of the major reasons for discretizing an attribute is that some of the machine learning algorithms perform poorly with continuous attribute and thus require front-end discretization of the input data. The paper describes a Fast Class-Attribute Interdependence Maximization (F-CAIM) algorithm that is an extension of the original CAIM algorithm. The algorithm works with supervised data by maximization of the classattribute interdependence. The F-CAIM’s improvement of the CAIM algorithm is significant shortening of the computational time required to discretize the data. It has all CAIM’s advantages like fully automated generation of possibly minimal number of discrete intervals, achieving the highest class-attribute interdependency when compared with other discretization algorithms, and improving performance of machine learning algorithms that are subsequently used on the discretized data. We present the results based on extensive benchmarking tests of F-CAIM, CAIM and six other state-of-the-art discretization algorithms. The tests use eight wellknown machine learning datasets consisting of continuous and mixed-mode attributes. They show that the F-CAIM’s speed is comparable to the speed of the simplest unsupervised algorithms and better than these of other supervised discretization algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discretization Algorithm that Uses Class-Attribute Interdependence Maximization

Most of the existing machine learning algorithms are able to extract knowledge from databases that store discrete attributes (features). If the attributes are continuous, the algorithms can be integrated with a discretization algorithm that transforms them into discrete attributes. The paper describes an algorithm, called CAIM (class-attribute interdependence maximization), for discretization o...

متن کامل

A Novel Tree Based Classification

Classification is a data mining (DM) technique used to predict or forecast the unknown information using the historical data. There are many classification techniques. ID3 is a very popular tree based classification algorithm for a categorical data which does not support continuous data. Attribute selection process plays major role in building a classification tree model. Attribute Selection in...

متن کامل

A Discretization Algorithm for Uncertain Data

This paper proposes a new discretization algorithm for uncertain data. Uncertainty is widely spread in real-world data. Numerous factors lead to data uncertainty including data acquisition device error, approximate measurement, sampling fault, transmission latency, data integration error and so on. In many cases, estimating and modeling the uncertainty for underlying data is available and many ...

متن کامل

The Interaction of Entropy-Based Discretization and Sample Size: An Empirical Study

An empirical investigation of the interaction of sample size and discretization – in this case the entropy-based method CAIM (Class-Attribute Interdependence Maximization) – was undertaken to evaluate the impact and potential bias introduced into data mining performance metrics due to variation in sample size as it impacts the discretization process. Of particular interest was the effect of dis...

متن کامل

Experiments with Decision Tree Classifiers – Discretization of Numerical Attributes

Classification algorithms are used in numerous applications everyday, from assigning letter grades to student student’s scores, to computerized letter recognition in mail processing. Discretization consists of applying a set of rules to reduce the number of discrete intervals from which an attribute is assigned. Discretization is generally applied to datasets whose numerical range consists of c...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2003

Fast Class-Attribute Interdependence Maximization (CAIM) Discretization Algorithm

نویسندگان

چکیده

منابع مشابه

Discretization Algorithm that Uses Class-Attribute Interdependence Maximization

A Novel Tree Based Classification

A Discretization Algorithm for Uncertain Data

The Interaction of Entropy-Based Discretization and Sample Size: An Empirical Study

Experiments with Decision Tree Classifiers – Discretization of Numerical Attributes

عنوان ژورنال:

اشتراک گذاری